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Abstract 

We  present  a  general  architecture  for  incremen¬ 
tal  interaction  between  modules  in  a  speech-to- 
intention  continuous  understanding  dialogue  sys¬ 
tem.  This  architecture  is  then  instantiated  in  the 
form  of  an  incremental  parser  which  receives  suit¬ 
ability  feedback  on  NP  constituents  from  a  refer¬ 
ence  resolution  module.  Oracle  results  indicate 
that  perfect  NP  suitability  judgments  can  provide  a 
labelled-bracket  error  reduction  of  as  much  as  42% 
and  an  efficiency  improvement  of  30%.  Prelimi¬ 
nary  experiments  in  which  the  parser  incorporates 
feedback  judgments  based  on  the  set  of  referents 
found  in  the  discourse  context  achieve  a  maximum 
error  reduction  of  9.3%  and  efficiency  gain  of  4.6%. 
The  parser  is  also  able  to  incrementally  instantiate 
the  semantics  of  underspecified  pronouns  based  on 
matches  from  the  discourse  context.  These  results 
suggest  that  the  architecture  holds  promise  as  a  plat¬ 
form  for  incremental  parsing  supporting  continuous 
understanding. 

1  Introduction 

Humans  process  language  incrementally,  as  has 
been  shown  by  classic  psycholinguistic  discussions 
surrounding  the  garden-path  phenomenon  and  pars¬ 
ing  preferences  (Altmann  and  Steedman,  1988; 
Konieczny,  1996;  Phillips,  1996).  Moreover,  a  va¬ 
riety  of  eye-tracking  experiments  (Cooper,  1974; 
Tanenhaus  and  Spivey,  1996;  Allopenna  et  al., 
1998;  Sedivy  et  al.,  1999)  suggest  that  complex  se¬ 
mantic  and  referential  constraints  arc  incorporated 
on  an  incremental  basis  in  human  parsing  decisions. 

Computational  parsers,  however,  still  tend  to  op¬ 
erate  an  entire  sentence  at  a  time,  despite  the  ad¬ 
vent  of  speech-to-intention  dialogue  systems  such 
as  Verbmobil  (Kasper  et  al.,  1996;  Noth  et  al.,  2000; 
Pinkal  et  al.,  2000),  Gemini  (Dowding  et  al.,  1993; 
Dowding  et  al.,  1994;  Moore  et  al.,  1995)  and  TRIPS 
(Allen  et  al.,  1996;  Ferguson  et  al.,  1996;  Fergu¬ 
son  and  Allen,  1998).  Naturalness,  robustness,  and 
interactivity  are  goals  of  such  systems,  but  control 


flow  is  typically  the  sequential  execution  of  mod¬ 
ules,  each  operating  on  the  output  of  its  predeces¬ 
sor;  only  after  the  entire  sentence  has  been  parsed 
do  higher-level  modules  such  as  intention  recogni¬ 
tion  and  reference  resolution  get  involved. 

In  contrast  to  this  sequential  model  is  the  con¬ 
tinuous  understanding  approach,  in  which  all  lev¬ 
els  of  language  analysis  occur  simultaneously,  from 
speech  recognition  to  intention  recognition.  As  well 
as  being  psycholinguistically  motivated,  continuous 
understanding  models  offer  potential  computational 
advantages,  including  accuracy  and  efficiency  im¬ 
provements  for  real-time  spoken  language  under¬ 
standing  and  better  support  for  the  spontaneities  of 
natural  human  speech.  Continuous  understanding 
is  necessary  if  the  system  is  to  respond  before  the 
entire  utterance  is  analyzed,  a  prerequisite  for  in¬ 
cremental  confirmation  and  clarification.  The  major 
computational  advantage  of  continuous  understand¬ 
ing  models  is  that  high-level  expectations  and  feed¬ 
back  should  be  able  to  influence  the  search  of  lower- 
level  processes,  thus  leading  to  a  focused  search 
through  hypotheses  that  arc  plausible  at  all  levels 
of  processing. 

One  of  the  major  current  applications  of  parsers 
that  operate  incrementally  is  for  language  modelling 
in  speech  recognition  (Brill  et  al.,  1998;  Jelinek  and 
Chelba,  1999).  This  work  is  important  not  only 
for  its  ability  to  improve  performance  on  the  speech 
recognition  task;  it  also  models  the  interactions  be¬ 
tween  speech  recognition  and  parsing  in  a  contin¬ 
uous  understanding  system.  Our  research  attempts 
to  further  the  quest  for  continuous  understanding  by 
moving  one  step  up  the  hierarchy,  building  an  incre¬ 
mental  parser  which  is  the  advisee  rather  than  the 
advisor. 

We  begin  by  presenting  a  general  architecture 
for  incremental  interaction  between  the  parser  and 
higher-level  modules,  and  then  discuss  a  specific  in¬ 
stantiation  of  this  general  architecture  in  which  a 
reference  resolution  module  provides  feedback  to 
the  parser  on  the  suitability  of  noun  phrases.  Ex¬ 
periments  with  incremental  feedback  from  a  refer- 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

2QQ£  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2006  to  00-00-2006 

4.  TITLE  AND  SUBTITLE 

Incremental  Parsing  with  Reference  Interaction 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Department  of  Computer  Science, University  of 

Rochester, Rochester, NY, 14627 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

8 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Figure  1:  A  General  Architecture  for  Incremental 
Parsing 

ence  resolution  module  and  an  NP  suitability  oracle 
arc  reported,  and  the  ability  of  the  implementation 
to  incrementally  instantiate  semantically  underspec¬ 
ified  pronouns  is  outlined.  We  believe  this  research 
provides  an  important  start  towards  developing  end- 
to-end  continuous  understanding  models. 

2  An  Incremental  Parsing  Architecture 

Many  current  parsers  fall  into  the  class  of  history- 
based  grammars  (Black  et  al.,  1992).  The  indepen¬ 
dence  assumptions  of  these  models  make  the  pars¬ 
ing  problem  both  stochastically  and  computation¬ 
ally  tractable,  but  represent  a  simplification  and  may 
therefore  be  a  source  of  error.  In  a  continuous  un¬ 
derstanding  framework,  higher-level  modules  may 
have  additional  information  that  suggests  loci  for 
improvement,  recognizing  either  invalid  indepen¬ 
dence  assumptions  or  errors  in  the  underlying  prob¬ 
ability  model. 

We  have  designed  a  general  incremental  parsing 
architecture  (Figure  1)  in  which  the  Client,  a  dy¬ 
namic  programming  parser,  performs  its  calcula¬ 
tions,  the  results  of  which  are  incrementally  passed 
on  via  a  Mediator  to  an  Advisor  with  access  to 
higher-level  information.  This  higher-level  Advi¬ 
sor  sends  feedback  to  the  Mediator  which  has  ac¬ 
cess  to  the  Client’s  chart,  and  which  then  surrepti¬ 
tiously  changes  and/or  adds  to  the  chart  in  order  to 
make  the  judgments  conform  more  closely  to  those 
of  the  Advisor.  The  parser,  whose  chart  has  (unbe¬ 
knownst  to  it)  been  changed,  then  simply  calculates 
chart  expansions  for  the  next  word,  naively  expand¬ 
ing  the  currently  available  (and  possibly  modified) 
hypotheses. 

This  architecture  is  general  in  that  neither  the  Me¬ 
diator  nor  the  Advisor  have  been  specified;  either 
of  these  modules  can  be  instantiated  in  any  number 
of  ways  within  the  general  framework.  The  typical 
dynamic  programming  component  will  function  in 
very  much  the  same  way  that  it  does  in  the  vanilla 
algorithm,  except  that  the  chart  in  which  partial  re¬ 


sults  are  recorded  may  be  modified  between  time 
steps.  The  Client  can  be  any  system  which  uses  dy¬ 
namic  programming  to  efficiently  encode  indepen¬ 
dence  assumptions,  so  long  as  it  provides  the  Me¬ 
diator  with  the  ability  to  modify  chart  probabilities 
and  add  chart  entries;  otherwise  the  original  parser 
can  remain  untouched.  By  having  the  Mediator  per¬ 
form  these  modifications  rather  than  the  Advisor, 
we  preserve  modularity:  in  this  architecture  the  Ad¬ 
visor  need  not  be  aware  of  the  specific  implementa¬ 
tion  of  the  Client,  although  depending  on  the  type 
of  advice  provided,  it  may  need  access  to  the  under¬ 
lying  grammar.  The  Mediator  isolates  the  Advisor 
and  Client  from  each  other  as  well  as  determining 
how  the  feedback  will  be  introduced  into  into  the 
Client’s  chart. 

Stoness  (2004)  identifies  two  broad  categories  of 
subversion  -  our  term  for  the  Mediator’s  surrepti¬ 
tious  modification  of  the  Client’s  chart  -  as  outlined 
below: 

•  Heuristic  Subversion:  the  Mediator  uses  the 
Advisor’s  feedback  as  heuristic  information, 
affecting  the  search  sequence  but  not  the  prob¬ 
abilities  calculated  for  a  given  hypothesis;  and 

•  Chart  Subversion:  the  Mediator  is  free  to 
modify  the  Client’s  chart  as  necessary,  but  does 
not  directly  affect  the  search  sequence  of  the 
Client  (except  insofar  as  this  is  accomplished 
by  the  modifications  to  the  chart). 

The  two  types  of  subversion  have  very  different 
properties.  Heuristic  subversion  will  affect  the  set 
of  analyses  which  is  output  by  the  parser,  but  each 
of  those  analyses  will  have  exactly  the  same  proba¬ 
bility  score  as  under  the  original  parser;  the  effects 
of  the  Advisor  arc  essentially  limited  to  determin¬ 
ing  which  hypotheses  remain  within  the  beam,  or 
the  order  in  which  hypotheses  arc  expanded,  de¬ 
pending  on  whether  the  underlying  parser  uses  a 
beam  search  or  an  agenda.  Chart  subversion,  on  the 
other  hand,  will  actually  change  the  scores  assigned 
analyses,  resulting  in  a  new  probability  distribution. 
Heuristic  subversion  is  considerably  less  powerful, 
but  more  stable;  the  effects  of  chart  subversion  can 
be  fairly  chaotic,  especially  if  care  is  not  taken  to 
avoid  feedback  loops.  Stoness  (2004)  outlines  con¬ 
ditions  under  which  the  effects  of  chart  subversion 
arc  predictable,  becoming  broadly  equivalent  to  an 
incremental  version  of  a  post-hoc  re -ranking  of  the 
Client’s  output  hypotheses. 

Further  details  on  the  general  architecture,  in¬ 
cluding  properties  of  various  modes  of  feedback  in¬ 
tegration,  a  discussion  of  the  relationship  between 


incremental  parsing  and  parse  re -ranking,  the  pos¬ 
sibilities  of  multiple  Advisors  working  in  combina¬ 
tion,  and  provisions  in  the  model  for  asynchronous 
feedback  arc  available  in  a  University  of  Rochester 
Technical  Report  (Stoness,  2004). 

3  Instantiating  the  Architecture 

Working  in  the  context  of  TRIPS,  an  existing  task- 
oriented  dialogue  system,  we  have  modified  the 
existing  parser  and  reference  resolution  modules 
so  that  they  communicate  incrementally  with  each 
other.  This  models  the  early  incorporation  of  refer¬ 
ence  resolution  information  seen  in  humans  (Cham¬ 
bers  et  al.,  1999;  Allopenna  et  ah,  1998),  and  al¬ 
lows  reference  resolution  information  to  affect  pars¬ 
ing  decisions. 

For  example,  in  “Put  the  apple  in  the  box  in  the 
corner”  there  is  an  attachment  ambiguity.  Reference 
resolution  can  determine  the  number  of  matches  for 
the  noun  phrase  “the  apple”  incrementally;  if  there 
is  a  single  match,  the  parser  would  expect  this  to 
be  a  complete  NP,  and  prefer  the  reading  where  the 
box  is  in  the  corner.  If  reference  returns  multiple 
matches  for  “the  apple”,  the  parser  would  expect 
disambiguating  information,  and  prefer  a  reading 
where  additional  information  about  the  apple  is  pro¬ 
vided:  in  this  case,  an  the  NP  “the  apple  in  the  box”. 

With  solid  feedback  from  reference,  it  should  be 
possible  to  remove  some  of  the  ambiguity  inherent 
in  the  search  process  within  the  parser.  This  will 
simultaneously  guide  the  search  to  the  most  likely 
region  of  the  search  space,  improving  accuracy,  and 
delay  the  search  of  unlikely  regions,  improving  effi¬ 
ciency.  Of  course,  this  comes  at  the  cost  of  some 
communication  overhead  and  additional  reference 
resolution.  Ideally,  the  overall  improvement  in  the 
parser’s  search  space  would  be  enough  to  cover 
the  additional  incremental  operation  costs  of  other 
modules. 

3.1  An  Incremental  Parser 

The  pre-existing  parser  in  the  dialogue  system  was  a 
pure  bottom-up  chart  parser  with  a  hand-built  gram¬ 
mar  suited  for  parsing  task-oriented  dialogue.  The 
grammar  consisted  of  a  context-free  backbone  with 
a  set  of  associated  features  and  semantic  restric¬ 
tions,  including  agreement,  hard  subcategorization 
constraints,  and  soft  selectional  restriction  prefer¬ 
ences.  The  parser  has  been  modified  so  that  when¬ 
ever  a  constituent  is  built,  it  can  be  sent  forward  to 
the  Mediator,  allowing  for  the  possibility  of  feed¬ 
back.  The  architecture  and  experiments  described  in 
this  paper  were  performed  in  a  synchronous  mode, 
but  the  parser  can  also  operate  in  an  incrementally 


asynchronous  mode,  where  it  continues  to  build  the 
chart  in  parallel  with  other  modules’  operations; 
probability  adjustments  to  the  chart  then  cascade  to 
dependent  constituents. 

3.2  Interaction  with  Reference 

When  the  parser  builds  a  potential  referring  expres¬ 
sion  (e.g.  any  NP),  it  is  immediately  passed  on  to  the 
Advisor,  the  reference  resolution  module  described 
in  Tetreault  et.  al.  (2004)  modified  for  incremental 
interaction.  This  module  then  determines  all  pos¬ 
sible  discourse  referents,  providing  the  parser  with 
a  ranked  classification  based  on  the  salience  of  the 
referents  and  the  (incremental)  syntactic  environ¬ 
ment. 

The  reference  module  keeps  a  dynamically  up¬ 
dated  list  of  currently  salient  discourse  entities 
against  which  incoming  incrementally  constructed 
NP  constituents  arc  matched.  Before  any  utterances 
arc  processed,  the  module  loads  a  static  database 
of  relevant  place  names  in  the  domain;  all  other 
possible  referents  are  discourse  entities  which  have 
been  spoken  of  during  the  course  of  the  dialogue. 
For  efficiency,  the  dynamic  portion  of  the  context 
list  is  limited  to  the  ten  most  recent  contentful  ut¬ 
terances;  human-annotated  antecedent  data  for  this 
corpus  shows  that  99%  of  all  pronoun  antecendents 
fall  within  this  threshold.  After  each  sentence  is 
fully  parsed  the  context  list  is  updated  with  new  dis¬ 
course  entities  introduced  in  the  utterance;  ideally, 
these  context  updates  would  also  be  incremental, 
but  this  feature  was  omitted  in  the  current  version 
for  simplicity. 

The  matching  process  is  based  on  that  described 
by  Byron  (2000),  and  differs  from  that  of  many 
other  reference  modules  in  that  every  entity  and 
NP-constituent  has  a  (possibly  underspecified)  se¬ 
mantic  feature  vector,  and  it  is  both  the  logical  and 
semantic  forms  which  determine  successful  match¬ 
ings.  Adding  semantic  information  increases  the  ac¬ 
curacy  of  the  reference  resolution  from  44%  to  58% 
(Tetreault  and  Allen,  2004),  and  consequently  im¬ 
proves  the  feedback  provided  to  the  parser. 

The  Mediator  receives  the  set  of  all  possible  ref¬ 
erents,  including  the  semantic  content  of  the  refer¬ 
ent  and  a  classification  of  whether  the  referent  is  the 
single  salient  entity  in  focus ,  has  previously  been 
mentioned,  or  is  a  relevant  place  name. 

3.3  Mediator 

The  Mediator  interprets  the  information  received 
from  reference  and  determines  how  the  parser’s 
chart  should  be  modified.  If  the  NP  matches  noth¬ 
ing  in  the  discourse  context,  no  match  is  returned; 
otherwise  each  referent  is  annotated  with  its  type 


and  discourse  distance,  and  this  set  is  run  through  a 
classifier  to  reduce  it  to  a  single  tag.  The  resulting 
tag  is  the  reference  resolution  tag,  or  R.  The  NP 
constituents  arc  also  classified  by  definiteness  and 
number,  giving  an  NP  tag  N. 

For  each  classifier,  we  trained  a  probability  model 
which  calculated  Pr,  the  probability  that  a  noun 
phrase  constituent  c  would  be  in  the  final  parse,  con¬ 
ditioned  on  R  and  N,  or 

Pr  =  p(c  in  final  parse |f?,  N). 

This  probability  was  then  linearly  combined  with 
the  parser’s  constituent  probability, 

Pp  =  P{c  — >  w”), 

according  to  the  equation 

P{c)  =  (1  -  A)  •  Pp  +  A  •  Pr 

for  various  values  of  A.  Evaluation  using  held-out 
data  suggested  that  a  value  of  A  =  0.2  would  be 
optimal.  This  style  of  feedback  is  an  example  of 
chart  subversion,  as  it  is  a  direct  modification  of 
constituent  probabilities  by  the  Mediator,  defining 
a  new  probability  distribution. 

4  Experiments 

The  Monroe  domain  (Tetreault  et  al.,  2004;  Stent, 
2001)  is  a  series  of  task-oriented  dialogues  between 
human  participants  set  in  a  simulated  rescue  op¬ 
eration  domain,  where  participants  collaboratively 
plan  responses  to  emergency  calls.  Dialogues  were 
recorded,  broken  up  into  utterances,  and  then  tran¬ 
scribed  by  hand,  removing  speech  repairs  from  the 
parser  input.  These  transcriptions  served  as  input 
for  all  experiments  reported  below. 

A  probabilistic  grammar  was  trained  from  su¬ 
pervised  data,  assigning  PCFG  probabilities  for  the 
rule  expansions  in  the  CFG  backbone  of  the  hand¬ 
crafted,  semantically  constrained  grammar.  The 
parser  was  run  using  this  grammar,  but  without  any 
incremental  interaction  whatsoever,  in  order  to  es¬ 
tablish  baseline  accuracy  and  efficiency  numbers. 
The  corpus  consists  of  six  task-oriented  dialogues; 
four  were  used  for  the  PCFG  training,  one  was 
held  out  to  establish  appropriate  parameter  values, 
and  one  was  selected  for  testing.  The  held-out  and 
test  dialogues  contain  hand-checked  gold  standard 
parses. 

Under  normal  operation  of  the  sequential  dia¬ 
logue  system,  the  parser  is  run  in  best-first  mode, 
providing  only  a  single  analysis  to  higher-level 
modules,  and  has  a  constituent  construction  limit  in 


Base 

All  NPs 

Def-Sing 

Precision 

94.6 

97.2 

96.3 

Recall 

71.1 

83.1 

78.8 

F-statistic 

82.9 

90.2 

87.6 

Improvement 

N/A 

7.3 

4.7 

Error  Red. 

N/A 

42.4 

27.2 

Work  Red. 

N/A 

30.3 

18.7 

Perfect  S 

224 

241 

236 

Parsed  S 

270 

282 

279 

Table  1 :  Results  for  (a)  The  baseline  parser  without 
reference  feedback,  (b)  An  Oracle  Advisor  correctly 
determining  status  of  all  NPs,  (c)  An  Oracle  Advi¬ 
sor  correctly  determining  status  of  definite  singular 
NPs. 

an  attempt  to  simulate  the  demands  of  a  real-time 
system.  When  the  parser  reaches  the  constituent 
limit,  appropriate  partial  analyses  arc  collected  and 
forwarded  to  higher-level  modules.  These  con¬ 
straints  were  kept  in  place  during  our  experiments, 
because  they  would  be  necessary  under  normal  op¬ 
eration  of  the  system.  Thus,  the  inability  to  parse  a 
sentence  does  not  necessarily  indicate  a  lack  of  cov¬ 
erage  of  the  grammar,  but  rather  a  lack  of  efficiency 
in  the  parsing  process. 

As  can  be  seen  in  Table  1,  the  parser  achieves  a 
94.6%  labelled  bracket  precision,  and  a  71.1%  la¬ 
belled  bracket  recall.  Note  that  only  constituents 
of  complete  parses  were  checked  against  the  gold 
standard,  to  avoid  any  bias  introduced  by  the  partial 
parse  evaluation  metric.  Of  the  290  gold  standard 
utterances  in  the  test  data,  270  could  be  parsed,  and 
224  were  parsed  perfectly. 

4.1  Oracle  Evaluation 

We  began  with  a  feasibility  study  to  determine 
how  significant  the  effects  of  incremental  advice  on 
noun  phrases  could  be  in  principle.  The  feedback 
from  the  reference  module  is  designed  to  determine 
whether  particular-  NPs  are  good  or  bad  from  a  refer¬ 
ence  standpoint.  We  constructed  a  simple  feedback 
oracle  from  supervised  data  which  determined,  for 
each  NP,  whether  or  not  the  final  parse  of  the  sen¬ 
tence  contained  an  NP  constituent  which  spanned 
the  same  input.  Those  NPs  marked  “good”,  which 
did  appeal-  in  the  parse,  were  added  to  the  chart  as 
new  constituents.  NPs  marked  “bad”  were  added  to 
the  chart  with  a  probability  of  zero1.  A  second  or- 

1  In  some  sense,  this  style  of  feedback  is  an  example  of 
heuristic  subversion,  as  it  has  the  effect  of  keeping  “good”  anal¬ 
yses  around  while  removing  “bad”  analyses  from  the  search 
space.  Technically,  this  is  also  chart  subversion,  as  each  hy¬ 
pothesis  has  its  score  multiplied  by  1  or  0,  depending  on 


acle  evaluation  performed  this  same  task,  but  only 
providing  feedback  on  definite  singular  NPs. 

The  results  of  both  oracles  arc  shown  in  Table 
1.  The  first  five  rows  give  the  precision,  recall,  f- 
statistic,  the  raw  f-statistic  improvement,  and  the  f- 
statistic  error  reduction  percentage,  all  determined 
in  terms  of  labelled  bracket  accuracy.  There  is  a 
marked  increase  in  both  precision  and  recall,  with 
an  overall  error  reduction  of  42.4%  with  the  full 
oracle  and  27.2%  with  the  definite  singular  oracle. 
Thus,  in  this  domain  over  a  quarter  of  all  incorrectly 
labelled  constituents  arc  attributable  to  syntactically 
incorrect  definite  singular  NPs.  The  number  of  con¬ 
stituents  built  during  the  parse  is  used  as  a  measure 
of  efficiency,  and  the  work  reduction  is  reported  in 
the  sixth  row  of  the  table,  showing  an  efficiency  im¬ 
provement  of  30.3%  or  18.7%,  depending  on  the  or¬ 
acle.  The  final  two  lines  of  the  table  show  that  both 
the  number  of  sentences  which  can  be  parsed  and 
the  number  of  sentences  which  are  perfectly  parsed 
increase  under  both  models. 

The  nature  of  the  oracle  experiment  ensures  some 
reduction  in  error  and  complexity,  but  the  magni¬ 
tude  of  the  improvement  is  surprising,  and  certainly 
encouraging  for  the  prospects  of  incremental  refer¬ 
ence.  Definite  singular  NPs  typically  have  a  unique 
referent,  providing  a  locus  for  effective  feedback, 
and  we  believe  that  incremental  interaction  with  an 
accurate  reference  module  might  approach  the  ora¬ 
cle  performance. 

4.2  Dialogue  Experiments 

For  these  experiments  the  parser  interacted  with  the 
actual  reference  module,  incorporating  feedback  ac¬ 
cording  to  the  model  discussed  in  Section  3.3.  The 
first  data  column  of  Table  2  repeats  the  baseline  re¬ 
sults  of  the  parser  without  reference  feedback.  The 
next  two  columns  show  statistics  for  a  run  of  the 
parser  with  incremental  feedback  from  reference, 
using  a  probability  model  based  on  a  classification 
scheme  which  distinguished  only  whether  or  not  the 
set  of  referent  matches  was  empty.  The  second  data 
column  shows  the  results  for  the  estimated  interpo¬ 
lation  parameter  value  of  A  =  0.2,  while  the  third 
data  column  shows  results  for  the  empirically  deter¬ 
mined  optimal  A  value  of  0. 1. 

The  results  arc  encouraging,  with  an  error  reduc¬ 
tion  of  8.2%  or  9.3%  on  the  test  dialogue,  although 
the  amount  of  work  the  parser  performed  was  re¬ 
duced  by  only  4.0%  and  3.6%.  A  further  encour¬ 
aging  sign  is  that  for  every  exploratory  A  value  we 


whether  it  is  “good”  or  “bad”.  In  this  degenerate  case  of  all- 
or-nothing  feedback,  chart  subversion  and  heuristic  subversion 
are  equivalent. 


Base 

SC 

SC 

CC 

A  = 

N/A 

0.2 

0.1 

0.2 

Precision 

94.6 

94.5 

94.8 

93.9 

Recall 

71.1 

74.1 

74.2 

73.9 

F-statistic 

82.9 

84.3 

84.5 

83.9 

F-stat  Imp. 

N/A 

1.4 

1.6 

1.0 

Error  Red. 

N/A 

8.2 

9.3 

5.8 

Work  Red. 

N/A 

3.6 

4.0 

4.6 

Perfect  S 

224 

225 

228 

223 

Parsed  S 

270 

273 

273 

273 

Table  2:  Results  for  Discourse  Experiment  with 
Simple  (SC)  and  Complex  (CC)  Classifiers 

tried  in  either  the  held-out  or  the  test  data,  both  the 
accuracy  and  efficiency  improved.  Reference  infor¬ 
mation  also  helped  increase  both  the  number  of  sen¬ 
tences  that  could  be  parsed  and  the  number  of  sen¬ 
tences  that  were  parsed  perfectly,  although  the  im¬ 
provements  were  small. 

The  estimated  value  of  A  =  0.2  produced  an  error 
reduction  that  was  approximately  20%  of  the  orac¬ 
ular,  which  is  a  very  good  start,  especially  consider¬ 
ing  that  this  experiment  used  only  the  information  of 
whether  there  was  a  referent  match  or  not.  The  effi¬ 
ciency  gains  were  more  modest  at  just  above  10%  of 
the  oracular  results,  although  one  would  expect  less 
radical  efficiency  improvements  from  this  experi¬ 
ment,  since  under  the  linear  interpolation  of  the  ex¬ 
periment,  even  extremely  dispreferred  analyses  may 
be  expanded,  whereas  the  oracle  simply  drops  all 
dispreferred  NPs  off  the  beam  immediately. 

We  performed  a  second  experiment  that  made 
more  complete  use  of  the  reference  data,  break¬ 
ing  down  referent  sets  according  to  when  and  how 
often  they  were  mentioned,  whether  they  matched 
the  focus,  and  whether  they  were  in  the  set  of 
relevant  place  names.  We  expected  that  this  in¬ 
formation  would  provide  considerably  better  re¬ 
sults  than  the  simple  match/no-match  classification 
above.  For  example,  consider  a  definite  singular 
NP:  if  it  matches  a  single  referent,  one  would  expect 
it  to  be  in  the  parse  with  high  probability,  but  multi¬ 
ple  matches  would  indicate  that  the  referent  was  not 
unique,  and  that  the  base  noun  probably  requires  ad¬ 
ditional  discriminating  information  (e.g.  a  preposi¬ 
tional  phrase  or  restrictive  relative  clause). 

Unfortunately,  as  the  final  column  of  Table  2 
shows,  the  additional  information  did  not  provide 
much  of  an  advantage.  The  amount  of  work  done 
was  reduced  by  4.6%,  the  largest  of  any  efficiency 
improvement,  but  error  reduction  was  only  5.8%, 
and  the  number  of  sentences  parsed  perfectly  actu- 


ally  decreased  by  one. 

We  conjecture  that  co-reference  chains  may  be  a 
significant  source  of  confusion  in  the  reference  data. 
Ideally,  if  several  entities  in  the  discourse  context 
all  refer  to  the  same  real-world  entity,  they  should 
be  counted  as  a  single  match.  The  current  refer¬ 
ence  module  does  construct  co-referential  chains, 
but  a  single  error  in  co-reference  identification  will 
cause  all  future  NPs  to  match  both  the  chain  and  the 
misidentified  item,  instead  of  producing  the  single 
match  desired. 

The  reference  module  has  to  rely  on  the  parser 
to  provide  the  correct  context,  so  there  is  something 
of  a  bootstrapping  problem  at  work,  which  indicates 
both  a  drawback  and  a  potential  of  this  type  of  in¬ 
cremental  interaction.  The  positive  feedback  loop 
bodes  well  for  the  potential  benefits  of  the  incre¬ 
mental  system,  because  as  the  incremental  reference 
information  begins  to  improve  the  performance  of 
the  parser,  the  context  provided  to  the  reference 
resolution  module  improves,  which  provides  even 
more  accurate  reference  information.  Of  course,  in 
the  early  stages  of  such  a  system,  this  works  against 
us;  many  of  the  reference  resolution  errors  could  be 
a  result  of  the  poor  quality  of  the  discourse  context. 

Our  current  efforts  aim  to  identify  and  correct 
these  and  other  reference  resolution  issues.  Not  only 
will  this  improve  the  performance  of  the  Reference 
Advisor  from  an  incremental  parsing  standpoint,  but 
it  should  also  further  our  understanding  of  reference 
resolution  itself. 

We  have  shown  efficiency  improvements  in  terms 
of  the  overall  number  of  constituents  constructed  by 
the  parser;  however,  one  might  ask  whether  this  im¬ 
provement  in  parsing  speed  comes  at  a  large  cost  to 
the  overall  efficiency  of  the  system.  We  suggest  that 
this  is  in  some  sense  the  wrong  question  to  ask,  be¬ 
cause  for  a  real-time  interactive  system  the  primary 
concern  is  to  keep  up  with  the  human  interlocutor, 
and  the  incremental  approach  offers  a  far  greater  op¬ 
portunity  for  parallelism  between  modules.  In  terms 
of  time  elapsed  from  speech  to  analysis,  the  system 
as  a  whole  should  benefit  from  the  incremental  ar¬ 
chitecture. 

5  Semantic  Replacement 

When  the  word  “it”  is  parsed  as  a  referential  NP,  it  is 
given  highly  underspecified  semantics.  We  have  im¬ 
plemented  a  Mediator  which,  for  each  possible  ref¬ 
erent  for  “it”,  adds  a  new  item  to  the  parser’s  chart 
with  the  underspecified  semantics  of  “it”  instanti¬ 
ated  to  the  semantics  of  the  referent. 

Consider  the  sentence  sequence  “Send  the  bus  to 
the  hospital”,  “Send  it  to  the  mall”.  At  the  point 


that  the  NP  “it”  is  encountered  in  the  second  sen¬ 
tence,  it  has  not  yet  been  connected  to  the  verb, 
so  the  incremental  reference  resolution  determines 
that  “the  bus”  and  “the  hospital”  arc  both  possi¬ 
ble  referents.  We  add  two  new  constituents  to  the 
chart:  “it”[the  hospital]  and  “it”[the  bus].  They 
arc  given  probabilities  infinitesimally  higher  than 
the  “it”  [underspecified]  which  already  exists  on  the 
chart.  Thus,  if  either  of  the  new  versions  of  “it” 
match  the  semantic  restrictions  inherent  in  the  rest 
of  the  parse,  they  will  be  featured  in  parses  with  a 
higher  probability  than  the  underspecified  version. 
“It”[the  bus]  matches  the  mobility  required  of  the 
object  of  “send”,  while  “it”[the  hospital]  does  not. 
This  results  in  a  parse  where  the  semantics  of  “it” 
arc  instantiated  early  and  incrementally. 

This  sort  of  capability  is  key  for  an  end-to-end 
incremental  system,  because  neither  the  reference 
module  nor  the  parser  is  capable,  by  itself,  of  deter¬ 
mining  incrementally  that  the  reference  in  question 
must  be  “the  bus”.  If  we  want  an  end-to-end  system 
which  can  interact  incrementally  with  the  user,  this 
type  of  decision-making  must  be  made  in  an  incre¬ 
mental  fashion. 

This  ability  is  also  key  in  the  presence  of  soft  con¬ 
straints  or  other  Advisors  which  prefer  one  possi¬ 
ble  moveable  referent  to  another;  under  incremental 
parsing,  these  constraints  would  have  the  chance  to 
be  applied  during  the  parsing  process,  whereas  a  se¬ 
quential  system  has  no  alternatives  to  the  default, 
underspecified  pronoun,  and  so  cannot  apply  these 
restrictions  to  discriminate  between  referents. 

Our  implementation  performs  the  semantic  vet¬ 
ting  discussed  above,  but  we  have  done  no  large- 
scale  experiments  in  this  area. 

6  Related  Work 

There  are  instances  in  the  literature  of  incremental 
parsers  that  pass  forward  information  to  higher-level 
modules,  but  none,  to  our  knowledge,  arc  designed 
as  continuous  understanding  systems,  where  all  lev¬ 
els  of  language  analysis  occur  (virtually)  simultane¬ 
ously. 

For  example,  there  are  a  number  of  robust  seman¬ 
tic  processing  systems  (Pinkal  et  al.,  2000;  Rose, 
2000;  Worm,  1998;  Zechner,  1998)  which  contain 
incremental  parsers  that  pass  on  partial  results  im¬ 
mediately  to  the  robust  semantic  analysis  compo¬ 
nent,  which  begins  to  work  on  combining  these 
sentence  fragments.  If  the  parser  cannot  find  a 
parse,  then  the  semantic  analysis  program  has  al¬ 
ready  done  at  least  paid  of  its  work.  However,  none 
of  the  above  systems  have  a  feedback  loop  between 
the  semantic  analysis  component  and  the  incremen- 


tal  parser.  So,  while  all  of  these  arc  in  some  sense 
examples  of  incremental  parsing,  they  are  not  con¬ 
tinuous  understanding  models. 

Schuler  (2002)  describes  a  parser  which  builds 
both  a  syntactic  tree  and  a  denotation-based  seman¬ 
tic  analysis  as  it  parses.  The  denotations  of  con¬ 
stituents  in  the  environment  arc  used  to  inform  pars¬ 
ing  decisions,  much  as  we  use  the  static  database  of 
place  names.  However,  the  feedback  in  our  system 
is  richer,  based  on  the  context  provided  by  the  pre¬ 
ceding  discourse.  Furthermore,  as  an  instantiation 
of  the  general  architecture  presented  in  Section  2, 
our  system  is  more  easily  extensible  to  other  forms 
of  feedback. 

7  Future  Work 

There  is  a  catch- 22  in  that  the  accurate  reference  in¬ 
formation  necessary  to  improve  parsing  accuracy  is 
dependent  on  an  accurate  discourse  context  which 
is  reliant  on  accurate  parsing.  One  way  to  cut  this 
Gordian  Knot  is  to  use  supervised  data  to  ensure  that 
the  discourse  context  in  the  reference  module  is  up¬ 
dated  with  the  gold  standard  parse  of  the  sentence 
rather  than  the  parse  chosen  by  the  parser;  a  context 
oracle,  if  you  will. 

A  major  undertaking  necessary  to  advance  this 
work  is  an  error  analysis  of  the  reference  module 
and  of  the  parser’s  response  to  feedback;  when  does 
feedback  lead  to  additional  work  or  decreased  ac¬ 
curacy  on  the  part  of  the  incremental  parser,  and  is 
the  feedback  that  leads  to  these  errors  collect  from 
a  reference  standpoint? 

Currently,  the  accuracy  of  the  parser  is  couched 
in  syntactic  terms.  The  precision  of  the  baseline 
PCFG  is  fairly  high  at  94.6%,  but  that  could  conceal 
semantic  errors,  which  could  be  collected  with  ref¬ 
erence  information.  Assessing  semantic  accuracy  is 
one  of  a  number  of  alternative  evaluation  metrics 
that  we  are  exploring. 

We  intend  to  gather  timing  data  and  investigate 
other  efficiency  metrics  to  determine  to  what  extent 
the  efficiency  gains  in  the  parser  offset  the  commu¬ 
nication  overhead  and  the  extra  work  performed  by 
the  reference  module. 

We  also  plan  to  do  experiments  with  different 
feedback  regimes,  experimenting  both  with  the  ac¬ 
tual  reference  results  and  with  the  oracle  data.  Fur¬ 
ther  experiments  with  this  oracle  data  should  enable 
us  to  appropriately  parameterize  the  linear  interpo¬ 
lation,  and  indeed,  to  investigate  whether  linear  in¬ 
terpolation  itself  is  a  productive  feedback  scheme, 
or  whether  an  integrated  probability  distribution 
over  parser  and  reference  judgments  is  more  effec¬ 
tive.  The  latter  scheme  is  not  only  more  elegant,  but 


can  also  be  shown  to  produce  probabilities  equiva¬ 
lent  to  those  assigned  parses  in  the  parse  re -ranking 
task  (Stoness,  2004). 

We’ve  shown  (Stoness,  2004)  that  feedback 
which  punishes  constituents  that  arc  not  in  the  fi¬ 
nal  parse  cannot  result  in  reduced  accuracy  or  effi¬ 
ciency;  under  certain  restrictions,  the  same  holds  of 
rewarding  constituents  that  will  be  in  the  final  parse. 
However,  it  is  not  clear  how  quickly  the  efficiency 
and  accuracy  gains  drop  off  as  errors  mount.  By  in¬ 
troducing  random  mistakes  into  the  Oracle  Advisor, 
we  can  artificially  achieve  any  desired  level  of  accu¬ 
racy,  which  will  enable  us  to  explore  the  character¬ 
istics  of  this  curve.  The  accuracy  and  efficiency  re¬ 
sponse  under  error  has  drastic  consequences  on  the 
types  of  Advisors  that  will  be  suitable  under  this  ar¬ 
chitecture. 

Finally,  it  is  clear  that  finding  only  the  discourse 
context  referents  of  a  noun  phrase  is  not  sufficient; 
intuitively,  and  as  shown  by  Schuler  (2002),  real- 
world  referents  can  also  aid  in  the  parsing  task.  We 
intend  to  enhance  the  reference  resolution  compo¬ 
nent  of  the  system  to  identify  both  discourse  and 
real-world  referents. 

8  Conclusion 

These  preliminary  experiments,  using  the  coars¬ 
est  grain  of  reference  information  possible,  achieve 
a  significant  fraction  of  the  oracular  accuracy  im¬ 
provements,  highlighting  the  potential  benefits  of 
incremental  interaction  between  the  parser  and  ref¬ 
erence  in  a  continuous  understanding  system. 

The  Oracle  feedback  for  NPs  shows  that  it  is  pos¬ 
sible  to  simultaneously  improve  both  the  accuracy 
and  efficiency  of  an  incremental  parser,  providing  a 
proof-in-principle  for  the  general  incremental  pro¬ 
cessing  architecture  we  introduced.  This  architec¬ 
ture  holds  great  promise  as  a  platform  for  instantiat¬ 
ing  the  wide  range  of  interactions  necessary  for  true 
continuous  understanding. 
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